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Abstract 

Two-player zero-sum games of infinite duration and their quantitative versions are used in ver¬ 
ification to model the interaction between a controller (Eve) and its environment (Adam). The 
question usually addressed is that of the existence (and computability) of a strategy for Eve that can 
maximize her payoff against any strategy of Adam. In this work, we are interested in strategies of 
Eve that minimize her regret, i.e. strategies that minimize the difference between her actual payoff 
and the payoff she could have achieved if she had known the strategy of Adam in advance. We 
give algorithms to compute the strategies of Eve that ensure minimal regret against an adversary 
whose choice of strategy is (i) unrestricted, (ii) limited to positional strategies, or (in) limited to 
word strategies, and show that the two last cases have natural modelling applications. These results 
apply for quantitative games defined with the classical payoff functions Inf, Sup, Limlnf, LimSup, and 
mean-payoff. We also show that our notion of regret minimization in which Adam is limited to word 
strategies generalizes the notion of good for games introduced by Henzinger and Piterman, and is 
related to the notion of determinization by pruning due to Aminof, Kupferman and Lampert. 


1 Introduction 

The model of two player games played on graphs is an adequate mathematical tool to solve important 
problems in computer science, and in particular the reactive-system synthesis problem [PR,89] , In that 
context, the game models the non-terminating interaction between the system to synthesize and its 
environment. Games with quantitative objectives are useful to formalize important quantitative aspects 
such as mean-response time or energy consumption. They have attracted large attention recently, see 
e.g. ICDHRlOllBCD+lll . Most of the contributions in this context are for zero-sum games: the objective 
of Eve (that models the system) is to maximize the value of the game while the objective of Adam 
(that models the environment) is to minimize this value. This is a worst-case assumption: because the 
cooperation of the environment cannot be assumed, we postulate that it is antagonistic. 

In this antagonistic approach, the main solution concept is that of a winning strategy. Given a 
threshold value, a winning strategy for Eve ensures a minimal value greater than the threshold against 
any strategy of Adam. However, sometimes there are no winning strategies. What should the behaviour 
of the system be in such cases? There are several possible answers to this question. One is to consider 
non-zero sum extensions of those games: the environment (Adam) is not completely antagonistic, rather 
it has its own specification. In such games, a strategy for Eve must be winning only when the outcome 
satisfies the objectives of Adam, see e.g. [CDFR14] . Another option for Eve is to play a strategy which 
minimizes her regret. The regret is informally defined as the difference between what a player actually 
wins and what she could have won if she had known the strategy chosen by the other player. Minimization 
of regret is a central concept in decision theory [Bel82| . This notion is important because it usually leads 
to solutions that agree with common sense. 

Let us illustrate the notion of regret minimization on the example of Fig. [0 In this example, Eve 
owns the squares and Adam owns the circles (we do not use the letters labelling edges for the moment). 
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Figure 1: Example weighted arena Gq. 


The game is played for infinitely many rounds and the value of a play for Eve is the long-run average 
of the values of edges traversed during the play (the so-called mean-payoff). In this game, Eve is only 
able to secure a mean-payoff of \ when Adam is fully antagonistic. Indeed, if Eve (from iq) plays to iq 
then Adam can force a mean-payoff value of 0, and if she plays to V 3 then the mean-payoff value is at 
least i. Note also that if Adam is not fully antagonistic, then the mean-payoff could be as high as 2. 
Now, assume that Eve does not try to force the highest value in the worst-case but tries to minimize her 
regret. If she plays v± t->- V 2 then the regret is equal to 1. This is because Adam can play the following 
strategy: if Eve plays to V 2 (from vf) then he plays V 2 H > v\ (giving a mean-payoff of 0), and if Eve plays 
to i >3 then he plays to V 5 (giving a mean-payoff of 1). If she plays tq >-» V 3 then her regret is § since 
Adam can play the symmetric strategy. It should thus be clear that the strategy of Eve which always 
chooses v\ e->■ V 2 is indeed minimizing her regret. 

In this paper, we will study three variants of regret minimization , each corresponding to a different 
set of strategies we allow Adam to choose from. The first variant is when Adam can play any possible 
strategy (as in the example above), the second variant is when Adam is restricted to playing memoryless 
strategies , and the third variant is when Adam is restricted to playing word strategies. To illustrate 
the last two variants, let us consider again the example of Fig. |T| Assume now that Adam is playing 
memoryless strategies only. Then in this case, we claim that there is a strategy of Eve that ensures 
regret 0. The strategy is as follows: first play to 1 / 2 , if Adam chooses to go back to tq, then Eve should 
henceforth play tq > V3. We claim that this strategy has regret 0. Indeed, when V2 is visited, either 
Adam chooses V 2 1 —> tq, and then Eve secures a mean-payoff of 2 (which is the maximal possible value), 
or Adam chooses V 2 tq and then we know that tq V 2 is not a good option for Eve as cycling between 
tq and V 2 yields a payoff of only 0. In this case, the mean-payoff is either 1, if Adam plays tq tq, 
or 5 , if he plays tq >->• tq. In all the cases, the regret is 0. Let us now turn to the restriction to word 
strategies for Adam. When considering this restriction, we use the letters that label the edges of the 
graph. A word strategy for Adam is a function w : N —> {a, b}. In this setting Adam plays a sequence 
of letters and this sequence is independent of the current state of the game. It is more convenient to 
view the latter as a game played on a weighted automata—assumed to be total and with at least one 
transition for every action from every state—in which Adam plays letters and Eve responds by resolving 
non-determinism.When Adam plays word strategies, the strategy that minimizes regret for Eve is to 
always play tq 1 —> tq. Indeed, for any word in which the letter a appears, the mean-payoff is equal to 2, 
and the regret is 0 , and for any word in which the letter a does not appear, the mean-payoff is 0 while 
it would have been equal to ^ when playing tq ^ V3. So the regret of this strategy is ^ and it is the 
minimal regret that Eve can secure. Note that the three different strategies give three different values in 
our example. This is in contrast with the worst-case analysis of the same problem (memoryless strategies 
suffice for both players). 

We claim that at least the two last variants are useful for modelling purposes. For example, the 
memoryless restriction is useful when designing a system that needs to perform well in an environment 
which is only partially known. In practical situations, a controller may discover the environment with 
which it is interacting at run time. Such a situation can be modelled by an arena in which choices in 
nodes of the environment model an entire family of environments and each memoryless strategy models 
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Payoff type 

Any strategy 
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Sup, Inf, 

PTIME-c 

PSPACE (LemU, 

EXPTIME-c 

LimSup 

(Thm® 

coNP-h ('LemfTTll 

(Thm[3J 

Limlnf 

PTIME-c (Thin [0 

PSPACE-c (Thm0 

EXPTIME-c (Thm® 

MP. MP 

MP equiv. (Thm[I]) 

PSPACE-c (ThmEJ 

Undecidable ( Lem [Till 


Table 1: Complexity of deciding the regret threshold problem. 


a specific environment of the family. In such cases, if we want to design a controller that performs 
reasonably well against all the possible environments, we can consider a controller that minimizes regret: 
the strategy of the controller will be as close as possible to an optimal strategy if we had known the 
environment beforehand. This is, for example, the modelling choice done in the famous Canadian 
traveller’s problem IPY01| : a driver is attempting to reach a specific location while ensuring the traversed 
distance is not too far from the shortest feasible path. The partial knowledge is due to some roads being 
closed because of snow. The Canadian traveller, when planning his itinerary, is in fact searching for 
a strategy to minimize his regret for the shortest path measure against a memoryless adversary who 
determines the roads that are closed. Similar situations naturally arise when synthesizing controllers 
for robot motion planning [WET 15] , We now illustrate the usefulness of the variant in which Adam is 
restricted to play word strategies. Assume that we need to design a system embedded into an environment 
that produces disturbances: if the sequence of disturbances produced by the environment is independent 
of the behavior of the system, then it is natural to model this sequence not as a function of the state 
of the system but as a temporal sequence of events, i.e. a word on the alphabet of the disturbances. 
Clearly, if the sequences are not the result of an antagonistic process, then minimizing the regret against 
all disturbance sequences is an adequate solution concept to obtain a reasonable system and may be 
preferable to a system obtained from a strategy that is optimal under the antagonistic hypothesis. 

Contributions. In this paper, we provide algorithms to solve the regret threshold problem (strict and 
non-strict) in the three variants explained above, i.e. given a game and a threshold, does there exist a 
strategy for Eve with a regret that is (strictly) less than the threshold against all (resp. all memoryless, 
resp. all word) strategies for Adam. It is worth mentioning that, in the first two cases we consider, 
we actually provide algorithms to solve the following search problem: find the controller which ensures 
the minimal possible regret. Indeed, our algorithms are reductions to well-known games and are such 
that a winning strategy for Eve in the resulting game corresponds to a regret-minimizing strategy in 
the original one. Conversely, in games played against word strategies for Adam, we only explicitly solve 
the regret threshold problem. However, since the set of possible regret values of the considered games is 
finite and easy to describe, it will be obvious that one can implement a binary search to find the regret 
value and a corresponding optimal regret-minimizing strategy for Eve. 

We study this problem for six common quantitative measures: Inf, Sup, Limlnf, LimSup, MP , MP. 
For all measures, but MP, the strict and non-strict threshold problems are equivalent. We state our 
results for both cases for consistency. In almost all the cases, we provide matching lower bounds showing 
the worst-case optimality of our algorithms. Our results are summarized in the table of Fig. |T] For the 
variant in which Adam plays word strategies only, we show that we can recover decidability of mean- 
payoff objectives when the memory of Eve is fixed in advance: in this case, the problem is NP-complete 
(Theorems 0] and [5]). 

Related works. The notion of regret minimization is a central one in game theory, see e.g. | Z.TBP08l 
and references therein. Also, iterated regret minimization has been recently proposed by Halpern et al. as 
a concept for non-zero sum games lHPT2l . There, it is applied to matrix games and not to game graphs. 
In a previous contribution, we have applied the iterated regret minimization concept to non-zero sum 
games played on weighted graphs for the shortest path problem iFGRIOj . Restrictions on how Adam is 
allowed to play were not considered there. As we do not consider an explicit objective for Adam, we do 
not consider iteration of the regret minimization here. 

The disturbance-handling embedded system example was first given in |DFll] , In that work, the 
authors introduce remorsefree strategies, which correspond to strategies which minimize regret in games 
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with w-regular objectives. They do not establish lower bounds on the complexity of realizability or 
synthesis of remorsefree strategies and they focus on word strategies of Adam only. 

In IH POfij . Henzinger and Piterman introduce the notion of good for games automata. A non- 
deterministic automaton is good for solving games if it fairly simulates the equivalent deterministic 
automaton. We show that our notion of regret minimization for word strategies extends this notion to 
the quantitative setting (Proposition [3]). Our definitions give rise to a natural notion of approximate 
determinization for weighted automata on infinite words. 

In [AKLIO] , Aminof et al. introduce the notion of approximate determinization by pruning for 
weighted sum automata over finite words. For a £ (0,1], a weighted sum automaton is a-determinizable 
by pruning if there exists a finite state strategy to resolve non-determinism and that constructs a run 
whose value is at least a times the value of the maximal run of the given word. So, they consider a 
notion of approximation which is a ratio. We will show that our concept of regret, when Adam plays 
word strategies only, defines instead a notion of approximation with respect to the difference metric 
for weighted automata (Proposition [2]). There are other differences with their work. First, we consider 
infinite words while they consider finite words. Second, we study a general notion of regret minimization 
problem in which Eve can use any strategy while they restrict their study to fixed memory strategies 
only and leave the problem open when the memory is not fixed a priori. 

Finally, the main difference between these related works and this paper is that we study the Inf, Sup, 
Limlnf, LimSup, MP . MP measures while they consider the total sum measure or qualitative objectives. 

2 Preliminaries 

A weighted arena is a tuple G = (V,Vb, E,w,vi) where ( V,E,w) is a finite edge-weighted graplQ with 
integer weights, Vg C V, and vi £ V is the initial vertex. In the sequel we depict vertices owned by Eve 
(i.e. Vg) with squares and vertices owned by Adam (i.e. V \ Eg) with circles. We denote the maximum 
absolute value of a weight in a weighted arena by W. 

A play in a weighted arena is an infinite sequence of vertices 7r = VoV\... where vq = vj and (vi, Vi+i) £ 

E for all i. We extend the weight function to partial plays by setting w((vi) l 2 i=k ) = w ( v i> v i+i)- 

A strategy for Eve (Adam) is a function a that maps partial plays ending with a vertex v in Vg 
(V \ Vg) to a successor of v. A strategy has memory m if it can be realized as the output of a finite 
state machine with m states (see e.g. ill PR 11 for a formal definition). A memoryless (or positional) 
strategy is a strategy with memory 1, that is, a function that only depends on the last element of the 
given partial play. A play n = VqVi ... is consistent with a strategy a for Eve (Adam) if whenever Vi £ Vg 
(vi £ V \ Vg), cr((vj)j<i) = Vi + 1 . We denote by ©g(G) (6y(G)) the set of all strategies for Eve (Adam) 
and by £™(G) (£™(G)) the set of all strategies for Eve (Adam) in G that require memory of size at 
most m, in particular £g(G) (£y(G)) is the set of all memoryless strategies of Eve (Adam) in G. We 
omit G if the context is clear. 

Payoff functions. A play in a weighted arena defines an infinite sequence of weights. We define 
below several classical payoff functions that map such sequences to real numbers @ Formally, for a play 
7r = VoVi ... we define: 

• the Inf (Sup) payoff, is the minimum (maximum) weight seen along a play: lnf(7r) = inf{w(ni, i>j+i)|i > 
0} and Sup(7r) = sup{u;(nj,nj+i) | z > 0}; 

• the Limlnf (LimSup) payoff, is the minimum (maximum) weight seen infinitely often: Limlnf(7r) = 
liminf^oo w(vi, Cj+i) and, respectively, we have that LimSup(7r) = limsup^^ w(vi, Uj+i); 

• the mean-payoff value of a play, i.e. the limiting average weight, defined using liminf or limsup 
since the running averages might not converge: M_P(7r) = liminffe^oo Tw{(vi)i<k) and MP(-7t) = 
lirnsup^^ rw((t>j)j<fc). In words, MP corresponds to the limit inferior of the average weight of 
increasingly longer prefixes of the play while MP is defined as the limit superior of that same 
sequence. 

1 W.l.o.g. G is assumed to be total: for each v £ V, there exists v' G V such that (v. v') £ E. 

2 The values of all functions are not infinite, and therefore in R since we deal with finite graphs only. 
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A payoff function Val is prefix-independent if for all plays it = vovi ..., for all * > 0, Val(7r) = 
VaK^-)^). It is well-known that Limlnf, LimSup, MP , and MP are prefix-independent. Often, the 
arguments that we develop work uniformly for these four measures because of their prefix-independent 
property. Inf and Sup are not prefix-independent but often in the sequel we apply a simple transformation 
to the game and encode Inf into a Limlnf objective, and Sup into a LimSup objective. The transformation 
consists of encoding in the vertices of the arena the minimal (maximal) weight that has been witnessed 
by a play, and label the edges of the new graph with this same recorded weight. When this simple 
transformation does not suffice, we mention it explicitly. 

Regret. Consider a fixed weighted arena G, and payoff function Val. Given strategies cr, r, for Eve 
and Adam respectively, and v G V , we denote by tt^ t the unique play starting from v that is consistent 
with cr and r and denote its value by: Valuer, r) := Val(7r" T ). We omit G if it is clear from the context. 
If v is omitted, it is assumed to be Vj. 

Let Ej C 63 and £y C Sy be sets of strategies for Eve and Adam respectively. Given a G £3 we 
define the regret of a in G w.r.t. £3 and £y as: 

re Ss 3 ,s v ( G ) : = su Pres v (suPcr'es 3 Val(cr',r) - Val(cr,r)). 

We define the regret of G w.r.t. £3 and £y as: 

Re gs 3 ,£ v ( G ) : = inf <res 3 regf, gjEv (G). 

When £3 or £y are omitted from reg(-) and Reg(-) they are assumed to be the set of all strategies for 
Eve and Adam. 

Remark 1 (Ratio vs. difference). Let G be a weighted arena and £3 C S 3 and £y C Sy. Consider 
the regret of G defined using the ratio measure, instead of difference. For Inf, Sup, Limlnf, and LimSup, 
deciding if the regret of G is (strictly) less than a given threshold r reduces (in polynomial time) to 
deciding the same problem in G\ og - which is obtained by replacing every weight x in G with log 2 x - for 
threshold log 2 r with the difference measure. 

We will make use of two other values associated with the vertices of an arena: the antagonistic and 
cooperative values, defined for plays from a vertex v G V as 

aVal u (G) := sup inf Val 1 ’(cr, r) cVaL u (G) := sup sup Valuer, t). 

cr£©3 tGGv (re© 3 r£©v 

When clear from context G will be omitted, and if v is omitted it is assumed to be iq. 

Remark 2. It is well-known that cVal and aVal can be computed in polynomial time, w.r.t. the 
underlying graph of the given arena, for all payoff functions but MP f CdAHSO 3i\ CD HI O') . For MP. cVal 
is known to be computable in polynomial time, for aVal it can be done in UP l~l coUP !,Jurf)S\l and in 
pseudo-polynomial time \ZPf)fi[\BCD + 1 l\j . 

3 Variant I: Adam plays any strategy 

For this variant, we establish that for all the payoff functions that we consider, the problem of computing 
the antagonistic value and the problem of computing the regret value are inter-reducible in polynomial 
time. As a direct consequence, we obtain the following theorem: 

Theorem 1. Deciding if the regret value is less than a given threshold (strictly or non-strictly) is PTIME- 
complete (under log-space reductions) for Inf, Sup, Limlnf, and LimSup, and equivalent to mean-payoff 
games (under polynomial-time reductions) for MP and MP. 

Upper bounds. We now describe an algorithm to compute regret for all payoff functions. To do so, 
we will use the fact that all payoff functions we consider, can be assumed to be prefix-independent. Thus, 
let us first convince the reader that one can, in polynomial time, modify Inf and Sup games so that they 
become prefix-independent. 
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Lemma 1. For a given weighted arena G, and payoff function Sup: Reg(G) = Reg(G max ); for payoff 
function Inf: Reg(G) = Reg(G m i n ). 

Consider a weighted arena G = (V, Vg , d/, E, w). We describe how to construct G m i n from G so that 
there is a clear bijection between plays in both games defined with the Inf payoff function. The arena 
G m i n consists of the following components: 

• V' = V x {w(e) | e G £}; 

• V' = {(v, n) G V’ | v G Vi}; 

• v'i = (dj, W); 

• E' 9 ((it, n), (v, to)) if and only if (it, v) € E and to = min{n, it;(u, it)}; 

• w' ((it, n), (it, to)) = to. 

Intuitively, the construction keeps track of the minimal weight witnessed by a play by encoding it into 
the vertices themselves. It is not hard to see that plays in G m j n indeed have a one-to-one correspondence 
with plays in G. Furthermore, the Limlnf and LimSup values of a play in G m i n are easily seen to be 
equivalent to the Inf value of the play in G m i n and the corresponding play in G. 

A similar idea can be used to construct weighted arena G max from a Sup game such that the maximal 
weight is recorded (instead of the minimal). 

Lemma 2. For payoff functions Inf, Sup, Limlnf. LimSup. MP . and MP computing the regret of a game 
is at most as hard as computing the antagonistic value of a (polynomial-size) game with the same payoff 
function. 

Consider a weighted arena G = (V, Vg ,d/. E, w). We describe how to construct G m i n from G so that 
there is a clear bijection between plays in both games defined with the Inf payoff function. The arena 
G m in consists of the following components: 

• V' = V x { 10 (e) | e G E}; 

• V£ = {(v,n) eV'\ve Vg}; 

• v'j = (vj, W); 

• E' 9 ((it, n), (u, to)) if and only if (it, v) € E and to = minjn, 10 ( 11 , d)}; 

• w' ((it, n), (o, to)) = to. 

Intuitively, the construction keeps track of the minimal weight witnessed by a play by encoding it into 
the vertices themselves. It is not hard to see that plays in G m m indeed have a one-to-one correspondence 
with plays in G. Furthermore, the Limlnf and LimSup values of a play in G m i n are easily seen to be 
equivalent to the Inf value of the play in G m j n and the corresponding play in G. 

A similar idea can be used to construct weighted arena G max from a Sup game such that the maximal 
weight is recorded (instead of the minimal). 

We now describe the construction used to prove Lemma [2] 

Let us fix a weighted arena G. We define a new weight function w' as follows. For any edge e = (it, v) 
let w'(e) = —00 if it G V \ Vg, and if it G Vg then w'(e) = maxjcVar | (it,i/) Gfi \ {e}}. Intuitively, 
w' represents the best value obtainable for a strategy of Eve that differs at the given edge. It is not 
difficult to see that in order to minimize regret, Eve is trying to minimize the difference between the 
value given by the original weight function w and the value given by w'. Let Range(w/) be the set of 
values {iu'(e) | e G E}. For b G Range(iu') we define G b to be the graph obtained by restricting G—the 
original weighted arena with weight function w —to edges e with w' (e) < b. 

Next, we will construct a new weighted arena G such that the regret of G is a function of the 
antagonistic value of G. Figure [2] depicts the general form of the arena we construct. We have three 
vertices vo G V \ Vg and iq, v± G Vg and a “copy” of G as G b for each b G Range(iu / ) \ {— 00 }. We have 
a self-loop of weight 0 on do which is the initial vertex of G, a self-loop of weight — 2W — 1 on d_l, and 
weight-0 edges from vq to iq and from iq to the initial vertices of G b for all b. Recall that G b might not 
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Figure 2: Weighted arena G, constructed from G. Dotted lines represent several edges added when the 
condition labelling it is met. 


be total. To fix this we add, for all vertices without a successor, a weight-0 edge to i;_l. The remainder 
of the weight function w, is defined for each edge e b in G b as w(e b ) = w{e) — b. 

Intuitively, in G Adam first decides whether he can ensure a non-zero regret. If this is the case, then 
he moves to v±. Next, Eve chooses a maximal value she will allow for strategies which differ from the 
one she will play (this is the choice of b ). The play then moves to the corresponding copy of G, i.e. G b . 
She can now play to maximize her mean-payoff value. However, if her choice of b was not correct then 
the play will end in v±. 

We show that, for all prefix-independent payoff functions we consider, the following holds: 

Claim 1 . For all prefix-independent payoff functions considered in this work Reg(G) = —aVal(G). 

This implies Lemma [2] for all prefix-independent payoff functions. Together with Lemma we get 
the same result for Inf and Sup. 

of the Claim. Let us start by arguing that the following equality holds. 

Reg(G) = inf sup sup {0, Val(cr', r) — Val(cr, r)}. (1) 

re©v 

Indeed, it follows from the definition of regret that if o' = a then the regret of the game is 0. Thus, Adam 
can always ensure the regret of a game is at least 0. Now, for b G Range(w'), define £ 3 ( 6 ) C 63 (G) as: 

£ 3 ( 6 ) := {o | sup sup Val(ff , ,r) < 6 }. 

rG©v <x'€@3\{o-} 

It is clear from the definitions that 0 G £ 3 ( 6 ) if and only if 0 is a strategy for Eve in G b which avoids 
ever reaching v±. Now, if we let 


b c r = sup sup Va \{o', t), 

t£©v cr'GSgMo’} 

then o G £ 3 ( 6 ) if and only b a < b. It follows that for all o: 

sup sup Val(er',T) = inf {6 1 o G £3(6)}. ( 2 ) 

TgSv <r'£03\{er} 


We now turn to the mean-payoff game played on G, and make some observations about the strategies 
we need to consider. It is well known that memory less strategies suffice for either player to ensure 
an antagonistic value of at least (resp. at most) aVal(G), for all quantitative games considered in 
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arena G 

Figure 3: Gadget to reduce a game to its regret game. 


this work, so we can assume that Adam and Eve play positionally. It follows that all plays either 
remain in coj or move to G b for some b , and Adam can ensure a non-positive payoff. Note that for 
frmax = max (Range (u/) \ {—oo}) we have G bmax = G. So the copy of G bmax in G has no edge to v±, 
and by playing to this sub-graph Eve can ensure a payoff of at least — | 6 max — W\ > —2W. As any play 
that reaches v± will have a payoff of — 2W — 1, we can restrict Eve to strategies which avoid v± : and 
hence all plays either remain in vq or (eventually) in the copy of G b for some b. Now G b contains no 
restrictions for Adam, so we can assume that he plays the same strategy in all the copies of G b (where 
he cannot force the play to u_l), and these strategies have a one-to-one correspondence with strategies in 
G. Likewise, as Eve chooses a unique G b to play in, we have a one-to-one correspondence with strategies 
of Eve in G and strategies in G. More precisely, if a £ © 3 (G) is such that a(v\) = v b and a avoids u_l, 
then the corresponding strategy a £ © 3 (G) is a valid strategy in G b , and hence: 

a(v i) = v b => a G £(&). (3) 

Now suppose a £ ©3(G) is a strategy such that a(v 1 ) = v b and a avoids u_l, and f £ 6y(G) is a strategy 
such that t(vo) = v\. Let a £ E(6) and r £ ©v(G) be the strategies in G corresponding to a and t 
respectively. It is easy to show that: 


— Val^,(d, f) = b — Val G (er, r). 
Putting together Equations HD 0 gives: 


-aVal(G) 


— sup^. infy Val ( A,(d, f) 
infs- sup ({—Valuer, f) | t(v 0 ) = Vi} U {0}) 
inf{sup({-Val 6 (d, f) | f (t> 0 ) = Vi} U {0}) | a(v 1 ) = v\} 
inf{sup TSSv ({ 6 -Val G (cr,r)} U {0}) | cr £ E( 6 )} 
info-eSg sup TgSv ({inf {6 | a £ E( 6 )} - Val G (a, r)} U {0}) 
info-eSg sup reSv sup (T , eeg {0, Val G (cr', r) - Val G (<r, r)} 
Reg(G) as required. 


( 4 ) 


□ 


Lower bounds. For all the payoff functions, from G we can construct in logarithmic space G' such 
that the antagonistic value of G is a function of the regret value of G', and so we have: 

Lemma 3. For payoff functions Inf, Sup, Limlnf, LimSup, MP . and MP computing the regret of a game 
is at least as hard as computing the antagonistic value of a (polynomial-size) game with the same payoff 
function. 
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Proof. Suppose G is a weighted arena with initial vertex u/. Consider the weighted arena G' obtained 
by adding to G the gadget of Figure [3] The initial vertex of G' is set to be vf. In G' from v' T Eve can 
either progress to the original game or to the new gadget, both with weight L. We claim that the right 
choice of values for the parameters L , Mi, M 2 , iVi, N? makes it so that the antagonistic value of G is a 
function of the regret of the game G'. 

Let us first give the values of L , Mi, M 2 , N\, and N2 for each of the payoff functions considered. 
For all our payoff functions we have Mi = M 2 = L; N\ = W + 1; and N2 = —3IF — 2. For Inf we have 
L = IF, for Sup we have L = —W and for the remaining payoff functions we have L = 0. 

The following result shows that computing the regret of G is at least as hard as computing the 
(antagonistic) value of G'. 

Claim 2. For all payoff functions: 


aVal(G) =W+ 1 - Reg(G'). 

We first observe that for all payoff functions we consider we have that aVal(G) and cVal(G) both 
he in [—IF, IF]. At vf Eve has a choice: she can choose to remain in the gadget or she can move to the 
original game G. If she chooses to remain in the gadget, her payoff will be — 3W — 2, meanwhile Adam 
could choose a strategy that would have achieved a payoff of cVal(G) if she had chosen to play to G. 
Hence her regret in this case is cVal(G) + 31F + 2 > 2IF + 2. Otherwise, if she chooses to play to G she 
can achieve a payoff of at most aVal(G). As cVal(G) < IF is the maximum possible payoff achievable 
in G, the strategy which now maximizes Eve’s regret is the one which remains in the gadget - giving 
a payoff of W + 1. Her regret in this case is W + 1 — aVal(G) < 2IF + 1. Therefore, to minimize her 
regret she will play this strategy, and Reg(G') = IF + 1 — aVal(G). □ 

Memory requirements for Eve and Adam. It follows from the reductions underlying the proof of 
Lemma [2] that Eve only requires positional strategies to minimize regret when there is no restriction on 
Adam’s strategies. On the other hand, for any given strategy <7 for Eve, the strategy r for Adam which 
witnesses the maximal regret against it consists of a combination of three positional strategies: first he 
moves to the optimal vertex for deviating (it is from this vertex that the alternative strategy a' of Eve 
will achieve a better payoff against r), then he plays his optimal (positional) strategy in the antagonistic 
game (i.e. against a). His strategy for the alternative scenario, i.e. against a', is his optimal strategy in 
the co-operative game which is also positional. This combined strategy is clearly realizable as a strategy 
with three memory states, giving us: 

Corollary 1. For payoff functions Limlnf, LimSup, MP and MP: Reg(G) = Reg s i ,e 3( g )- 

The algorithm we give relies on the prefix-independence of the payoff function. As the transformation 
from Inf and Sup to equivalent prefix-independent ones is polynomial it follows that polynomial memory 
(w.r.t. the size of the underlying graph of the arena) suffices for both players. 


4 Variant II: Adam plays memoryless strategies 

For this variant, we provide a polynomial space algorithm to solve the problem for all the payoff functions, 
we then provide lower bounds. 

Theorem 2. Deciding if the regret value is less than a given threshold (strictly or non-strictly) playing 
against memoryless strategies of Adam is PSPACE -complete for Limlnf. MP and MP; in PSPACE and 
coNP -hard for Inf, Sup and LimSup. 

Upper bounds. Let us now show how to compute regret against positional adversaries. 

Lemma 4. For payoff functions Inf, Sup, Limlnf. LimSup, MP and MP, the regret of a game played 
against a positional adversary can be computed in polynomial space. 

Given a weighted arena G, we construct a new weighted arena G such that we have that —aVal(G) 
is equivalent to the regret of G. 
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Figure 4: Example weighted arena G\. 
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Figure 5: Weighted arena G i, constructed from G\ w.r.t the MP payoff function. In the edge set 
component only edges leaving Adam nodes are depicted. 


The vertices of G encode the choices made by Adam. For a subset of edges D C E , let G\D 
denote the weighted arena (V,Vb, D,w,vj). The new weighted arena G is the tuple (V, Vj, E, w, u» 
where (?) V = V x P(E); ( ii ) V 3 = {( v,D ) G V \ v G V 3 }; (in) uj = ( vi,E ); ( iv) E contains the 
edge ((u,C),(v,D)) if and only if (u, v) G C and, either u G Vs and D = C, or u G V \ V 3 and 
D = C \ {(u, x) G E | x / u}; (v) w((u,C), ( v,D )) = w(u,v) — cVal(G|'D). The application of this 
transformation for the graph of Fig. Q] w.r.t. to the MP payoff function is given in Fig. [5] 

Consider a play n = (vq, Cq)(vi,Ci) ... in G. We denote by [ 7 r] k , for k G {1, 2}, the sequence (ck,i)i> o, 
where Cfcy is the fc-th component of the i-th pair from n. Observe that [#] x is a valid play in G. Also 
observe that E D Cj D Cj +1 for all j. Hence [ 7 f] 2 is an infinite descending chain of finite subsets, 
and therefore lim [ 7 r ] 2 is well-defined. Finally, we define c( 7 r) := cVal(Gf lim [ir] 2 ). The following result 
relates the value of a play in G to the value of the corresponding play in G. 

Lemma 5. For payoff functions Limlnf, LimSup. MP . MP and for any play it in G we have that Val( 7 r) = 
Val([ 7 r]i) - c(tt). 

Proof. We first establish the following intermediate result. It follows from the existence of lim [-7t] 2 and 
the definition of c(-) that: 


^ II —_L II —_L 

limsup —cVal(GI'Gi) = lim inf — cVal(G|‘Gj) = c(7r). (5) 

n->oo n " n-loo n ^ 

1=0 2=0 
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We now show that the result holds for MP. 


Valffr) = liminf ( — (w(vi,Vi+ 1 ) — cVal(G(G 7 )) 

n—too \ n z ' 


i—0 


= Va^^Jj^) — limsup - J2 cVal(GrGj) 

n—too Ti 

3=0 

= Vai([n] 1 ) -c{n) 


defs. ofVal(-),u) 

def. ofVal(-) 
from Eq. © 


The proofs for the other payoff functions are almost identical (for Liminf and LimSup replace the use 
of Equation © by Equation ©). 


limsupcVal(G|'Ci) = lim inf cVal(GfGi) = c(7r). 


( 6 ) 

□ 


We now describe how to translate winning strategies for either player from G back to G, i.e. given an 
optimal maximizing (minimizing) strategy for Eve (Adam) in G we construct the corresponding optimal 
regret minimizing strategy (memoryless regret maximizing counter-strategy) for Eve (Adam) in G. For 
clarity, we follow this same naming convention throughout this section: again, we say a strategy is an 
optimal maximizing (minimizing) strategy when we speak about antagonistic and cooperative games, we 
say a strategy is an optimal regret maximizing (regret minimizing) when we speak about regret games. 
When this does not suffice, we explicitly state which kind of game we are speaking about. 

Let e G © 3 (G) be an optimal maximizing strategy of Eve in G and a G ©v(G) be an optimal 
minimizing strategy of Adam. Indeed, in |EM79| it was shown that mean-payoff games are positionally 
determined. We will now define a strategy for Eve in G which for every play prefix s constructs a valid 
play prefix s in G and plays as i would in G for s. More formally, for a play prefix s from G, denote 
by [s] 1 1 the corresponding sequence of vertex and edge-set pairs in G (indeed, it is the inverse function 
of which is easily seen to be bijective).Define a G © 3 (G) as follows: ct(s) = [e([s] 1 ) 1 )] 1 for all play 
prefixes s G V* ■ Vg in G consistent with a positional strategy of Adam. 

For a fixed strategy of Eve we can translate the optimal minimizing strategy of Adam in G into an 
optimal memoryless regret maximizing counter-strategy of his in G. Formally, for an arbitrary strategy 
a for Eve in G, define a G 63 (G) as follows: a(s ) = crQs^) for all s G V* • V 3 . Let r CT be an optimal 
(positional) maximizing strategy for Adam in G( lim [ 7 r (3 . (S ,] 2 . 

It is not hard to see the described strategy of Eve ensures a regret value of at most — aVal(G). 
Slightly less obvious is the fact that for any strategy of Eve, the counter-strategy t„ of Adam is such 
that sup (T , gS3 Val G (cr',r (T ) - Val G (cr,T (T ) > -aVal(G). 

Lemma 6 . For payoff functions Liminf, LimSup, MP . and MP; 

Rege 3 , E i(G) = -aVal(G). 

Proof. The proof is decomposed into two parts. First, we describe a strategy a G © 3 (G) which ensures 
a regret value of at most — aVal(G). Second, we show that for any a G © 3 (G) there is a r G Sy(G) such 
that 

sup Val G (ct',t) — Val G (cr, r) > — aVal(G). 

ct'G©3 

The result follows. 

We have already mentioned earlier that for a play n in G we have that is a play in G. Let 
PPref(G) denote the set of all play prefixes consistent with a positional strategy for Adam in G. It is not 
difficult to see that [-^ is indeed a bijection between plays of G and plays of G consistent with positional 
strategies for Adam. 

It follows from the determinacy of antagonistic games defined by the payoff functions considered in 
this work that there are optimal strategies for Eve and Adam that ensure a payoff of, respectively, at 
least and at most a value aVal(G) against any strategy of the opposing player. Let i G © 3 (G) be an 
optimal maximizing strategy of Eve in G and a G ©v(G) be an optimal minimizing strategy of Adam. 
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(First Part). Define a strategy o from 63 (G) as follows: o(s) = ^([s^ 1 )] 1 for all s £ PPref(G) • V 3 . 
We claim that 

regs 3 ,E^(G) < —aVal(G). 

Towards a contradiction, assume there are r £ Ey(G) and o' £ 63 (G) such that 

Valuer', t) - Val G (o-,r) > -aVal(G). 

Define a strategy f £ 6 y(G) as follows: f(s) = r([s] 1 ) for all s £ V* ■ V \ V 3 . From the definition of e 
and our assumption we get that 

Val G (<r', r) - Val G (a, r) > -aVal(G) > -Val 6 (e, f). (7) 

It is straightforward to verify that [7r CTT ] 1 = 7 r--. Therefore, from Lemma 0 we have: 

Val(7r CT , T ) > Val(7T (Tr ) - Val(TT^) = cVal(Griim [ir if \ 2 ). (8) 

At this point we note that, since r is a positional strategy, it holds that Valuer', t) is at most the 
highest payoff value attainable in G restricted to the edges allowed by r. Formally, if E r = {( u,v ) £ 
E | u £ V \ V 3 => v = t(u)} then Val( 7 r CT , T ) < cVal(G|'S r ). Also, by construction of f we get that 
E t C lim [ 7 T|-] 2 . It should be clear that this implies cVa^G)lim [ 7 t-.-] 2 ) > cVal(G|‘£’ T ). This contradicts 
Equation (0). 


(Second Part). For the second part of the proof we require the following result which relates positional 
strategies for Adam in G that agree on certain vertices to strategies in sub-graphs defined by plays in G. 


Claim 3 . Let a £ 63(G) and t,t' £ Xy(G). Then n aT = Tr aT , if and only if t' £ Ey(G( lim [[7r rcr ] x 1 ] 2 ) 

Proof, (only if) Note that by construction of G we have that once Adam chooses an edge ((u, G), (v. D )) 
from a vertex (v, C) £ V \ V 3 then on any subsequent visit to a vertex (it, C') £ V \ V3 he has no other 
option but to go to (v,C). That is, his choice is restricted to be consistent with the history of the 
play. For a play 7r in G, it is clear that the sequence [7r] 2 is the decreasing sequence of sets of edges 
consistent (for Adam) with the history of the play in the same manner. In particular, for any t' £ Ey(G) 
and any play 7r in G consistent with t' we have that t' is a valid strategy for Adam in G\E' where 
E' = lim [[7r]j 1 ] 2 . As 7r (TT = 7 t gt , is a play consistent with r', the result follows. 

(if) Suppose 7 r crr ^ 7 T aT 'i an d let v be the last vertex in their common prefix. As er is common 
to both plays, we have v £ V \ V3, and t{v) ^ t'(v). In particular, (v,t'(v)) £ lim [[7t to .]^ ] 2 so 
r'^E^Gtlim^]- 1 ],). □ 

For an arbitrary strategy er for Eve in G, defined £ 63(G) as follows: a(s-(v,D)) = (cr([s • (v, D)]^ 1 ), D) 
for all s-(v, D) £ V* - V^.Let r a be an optimal (positional) maximizing strategy for Adam in G|Tim [ttv ci ] 2 . 
We claim that for all a £ ©3(G) we have that 

sup Val G (cr', r a ) — Val G (a,r CT ) > —aVal(G). 

Towards a contradiction, assume that for some a £ 63(G) it is the case that for all o' £ 63(G) the left 
hand side of the above inequality is strictly smaller than the right hand side. By definition of a we then 
get the following inequality. 

sup Val G (o',T a ) — Val G (cr, r a ) < — aVal(G) < — Val^(d, a) ( 9 ) 

Using the above Claim it is easy to show that \n aTa \ = 7r && . Hence, by Equation © and Lemma Owe 
get that: 

sup Val(7v T J < cVal(G( lim [n && ] 2 ) (10) 

However, by choice of r CT , we know that there is a strategy o" £ 63(G) such that Val(7r CT , /lv ) = 
cVal(G(lim [7r (5 . a ] 2 ). This contradicts Equation (HOI) and completes the proof of the Theorem. □ 
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If G was constructed from a Inf or Sup game H , then one could easily transfer the described strategy 
of Eve, a into a strategy for her in H which achieves the same regret. In order to have a symmetric 
result we still lack the ability to transfer a strategy of Adam from G to the original game H. Consider 
a modified construction in which we additionally keep track of the minimal (resp. maximal) weight seen 
so far by a play, just like described in Sect. [3] Denote the corresponding game by G. The vertex set 
V of G is thus a set of triples of the form (v,C, x) where x is the minimal (resp. maximal) weight the 
play has witnessed. We observe that in the proof of the above result the intuition behind why we can 
transfer a strategy of Adam from G back to G as a memoryless strategy, although the vertices in G 
already encode additional information, is that once we have fixed a strategy of Eve in G, this gives 
us enough information about the prefix of the play before visiting any Adam vertex. In other words, we 
construct a strategy of Adam tailored to spoil a specific strategy of Eve, ex, in G using the information 
we gather from [-]]) 1 and his optimal strategy in G. These properties still hold in G. Thus, we get the 
following result. 

Lemma 7. For payoff functions Inf, Sup: Reg Sg s i(G) = —aVal(G). 

We recall a result from |EM79| which gives us an algorithm for computing the value Ree aj w (G) in 
polynomial space. In [EM79] the authors show that the value of a mean-payoff game G is equivalent to 
the value of a finite cycle forming game T^j played on G. The game is identical to the mean-payoff game 
except that it is finite. The game is stopped as soon as a cycle is formed and the value of the game is 
given by the mean-payoff value of the cycle. 

Proposition 1 (Finite Mean-Payoff Game [EM79j l. The value of a mean-pay off game G is equal to the 
value of the finite cycle forming game T g played on the same weighted arena. 

As Limlnf and LimSup games are also equivalent to their finite cycle forming game (see [AR14j l it 
follows that one can use an Alternating Turing Machine to compute the value of a game and that said 
machine will stop in time bounded by the length of the largest simple cycle in the arena. We note the 
length of the longest simple path in G is bounded by |Vj(|.E| + 1). Hence, we can compute the winner 
of G in alternating polynomial time. Since APTIME = PSPACE, this concludes the proof of Lemma [I] 

Lower bounds. We give a reduction from the QSAT Problem to the problem of determining whether, 
given rgQ, Reg e E i (G) <1 r for the payoff functions Limlnf, MP , and MP (for <\ £ {<, <}). Then we 
provide a reduction from the complement of the 2-disjoint-paths Problem for LimSup, Sup, and Inf. 

The crux of the reduction from QSAT is a gadget for each clause of the QSAT formula. Visiting this 
gadget allows Eve to gain information about the highest payoff obtainable in the gadget, each entry point 
corresponds to a literal from the clause, and the literal is visited when it is made true by the valuation 
of variables chosen by Eve and Adam in the reduction described below. Figure [7] depicts an instance of 
the gadget for a particular clause. Let us focus on the mean-payoff function. Note that staying in the 
inner 6 -vertex triangle would yield a mean-payoff value of 4. However, in order to do so, Adam needs 
to cooperate with Eve at all three corner vertices. Also note that if he does cooperate in at least one of 
these vertices then Eve can secure a payoff value of at least -y. 

Lemma 8. For r £ Q, weighted arena G and payoff function Limlnf, MP , or MP, determining whether 
Reg s s i (G) <\ r, for < £ {<, <}, is PSPACE -hard. 

The QSAT Problem asks whether a given fully quantified boolean formula (QBF) is satisfiable. 
The problem is known to be PS PACE-complete |GJ79j . It is known the result holds even if the formula 
is assumed to be in conjunctive normal form with three literals per clause (also known as 3-CNF). 
Therefore, w.l.o.g., we consider an instance of the QSAT Problem to be given in the following form: 

3x 0 Vxi3x 2 ■ ■ • ^(ajo, x\, ...,x n ) 

where $ is in 3-CNF. Also w.l.o.g., we assume that every non-trivially true clause has at least one 
existentially quantified variable (as otherwise the answer to the problem is trivial). 

It is common to consider a QBF as a game between an existential and a universal player. The 
existential player chooses a truth value for existentially quantified variable Xi and the universal player 
responds with a truth value for ajj+i. After n turns the truth value of $ determines the winner: the 
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Figure 6 : Depiction of the reduction from QBF. 



■< 


Figure 7: Clause gadget for the QBF reduction for clause Xi V -i Xj V Xk- 


existential player wins if $ is true and the universal player wins otherwise. The game we shall construct 
mimics the choices of the existential and universal player and makes sure that the regret of the game is 
small if and only if $ is true. 

Let us first consider the strict regret threshold problem. We will construct a weighted arena G in 
which Eve wins in the strict regret threshold problem for threshold 2 if and only if $ is true. 

Lemma 9 . For weighted arena G and payoff function Limlnf , MP . or MP, determining whether Regg 3 s i (G) < 
2 is PSPACE-/iard. 

Proof. We first describe the value-choosing part of the game (see Figure [G]). Va contains vertices for 
every existentially quantified variable from the QBF and V \ Vj contains vertices for every universally 
quantified variable. At each of this vertices, there are two outgoing edges with weight 0 corresponding to 
a choice of truth value for the variable. For the variable Xi vertex, the true edge leads to a vertex from 
which Eve can choose to move to any of the clause gadgets corresponding to clauses where the literal 
Xi occurs (see dotted incoming edge in Figure [TJ or to advance to Xi + \. The false edge construction is 
similar, while leading to the literal xf rather than to Xi. From the vertices encoding the choice of truth 
value for x n Eve can either visit the clause gadgets for it or move to a “final” vertex <f> £ V 3 . This final 
vertex has a self-loop with weight 2 . 
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To conclude the proof, we describe the strategy of Eve which ensures the desired property if the QBF 
is satisfiable and a strategy of Adam which ensures the property is falsified otherwise. 

Assume the QBF is true. It follows that there is a strategy of the existential player in the QBF game 
such that for any strategy of the universal player the QBF will be true after they both choose values for 
the variables. Eve now follows this strategy while visiting all clause gadgets corresponding to occurrences 
of chosen literals. At every gadget clause she visits she chooses to enter the gadget. If Adam now decides 
to take the weight 4 edge, Eve can achieve a mean-payoff value of B- or a Limlnf value of 3 by staying 
in the gadget. In this case the claim trivially holds. We therefore focus in the case where Adam chooses 
to take Eve back to the vertex from which she entered the gadget. She can now go to the next clause 
gadget and repeat. Thus, when the play reaches vertex $, Eve must have visited every clause gadget 
and Adam has chosen to disallow a weight 4 edge in every gadget. Now Eve can ensure a payoff value of 
2 by going to <h. As she has witnessed that in every clause gadget there is at least one vertex in which 
Adam is not helping her, alternative strategies might have ensured a mean-payoff of at most and a 
Limlnf value of at most 3. Thus, her regret is less than 2. 

Conversely, if the universal player had a winning strategy (or, in other words, the QBF was not 
satisfiable) then the strategy of Adam consists in following this strategy in choosing values for the 
variables and taking Eve out of clause gadgets if she ever enters one. If the play arrives at >I> we have 
that there is at least one clause gadget that was not visited by the play. We note there is an alternative 
strategy of Eve which, by choosing a different valuation of some variable, reaches this clause gadget and 
with the help of Adam achieves value 4. Hence, this strategy of Adam ensures regret of exactly 2. If 
Eve avoids reaching $ then she can ensure a value of at most 0, which means an even greater regret for 
her. □ 

We observe that the above reduction can be readily parameterized. That is, we can replace the 4 
value, the 3 value and the 2 value by arbitrary values A, B , C satisfying the following constraints: 

• A> B > C, 

• 2A ^ B — C < r so that Eve wins if is true, 

• A — C > r so that Adam wins if $ is false, and 

• A— 2A ^ B Or so that he never helps Eve in the clause gadgets. 

Indeed, the valuation of A, B } C we chose: 4, 3, 2 with r = 2, satisfies these inequalities exactly. It is 
not hard to see that if we find a valuation for r, A, B , C which meets the first restriction and the last 
three having changed from strict to non-strict, and vice-versa, we can get a reduction that works for the 
non-strict regret threshold problem. That is, find values such that 

• A> B > C, 

• 2A £ B — C < r so that Eve wins if $ is true, 

• A — C > r so that Adam wins if $ is false, and 

• A — 2A ^ B < r so that he never helps Eve in the clause gadgets. 

For example, one could consider A = 10, B = 7, C = 5 and r = 4. 

Lemma 10. For weighted arena G and payoff function Limlnf, MP , or MP, determining whether 
Reg s s 1 (G) < 4 is PSPACE -hard. 

Lemma 11. For r £ Q, weighted arena G and payoff function Inf, Sup, or LimSup, determining whether 
Reg 0 3 jE i(G) Or, for O £ {<,<}, is coNP -hard. 

Proof. We provide a reduction from the complement of the 2 -disjoint-paths Problem on directed 
graphs |ET98| . As the problem is known to be NP-complete, the result follows. In other words, we 
sketch how to translate a given instance of the 2 -disjoint-paths Problem into a weighted arena in 
which Eve can ensure regret value strictly less than 1 if and only if the answer to the 2 -disjoint-paths 
Problem is negative. 
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Figure 8 : Regret gadget for 2-disjoint-paths reduction. 


Consider a directed graph G and distinct vertex pairs (si,fi) and (,s 2 ,£ 2 )- W.l.o.g. we assume that 
for all i G {1, 2}: (*) t,j is reachable from Sj, and (ii) tj, is a sink (i.e. has no outgoing edges), in G. We 
now describe the changes we apply to G in order to get the underlying graph structure of the weighted 
arena and then comment on the weight function. Let all vertices from G be Adam vertices and si be 
the initial vertex. We replace all edges on t\ —edges of the form (v,ti) incident, for some v— by a copy 
of the gadget shown in Figure[ 8 l Next, we add self-loops on t\ and f 2 with weights 1 and 2, respectively. 
Finally, the weights of all remaining edges are 0. 

We claim that, in this weighted arena, Eve can ensure regret strictly less than 1—for payoff functions 
Sup and LimSup —if and only if in G the vertex pairs (si,ti) and (s 2 ,f 2 ) cannot be joined by vertex- 
disjoint paths. Indeed, we claim that the strategy that minimizes the regret of Eve is the strategy that, 
in states where she has a choice, tells her to go to t \. 

First, let us prove that this strategy has regret strictly less than 1 if and only if no two disjoint paths 
in the graph exist between the pairs of states (si,ti) and (s 2 ,t 2 ). Assume the latter is the case. Then if 
Adam chooses to always avoid t\, then clearly the regret is 0. If £1 is eventually reached, then the choice 
of Eve secures a value of 1 (for all payoff functions). Note that if she had chosen to go towards s 2 instead, 
as there are no two disjoint paths, we know that either the path constructed from s 2 by Adam never 
reaches f 2 , and then the value of the path is 0—and the regret is 0 for Eve—or the path constructed from 
s 2 reaches t\ again since Adam is playing positionally—and, again, the regret is 0 for Eve. Now assume 
that two disjoint paths between the source-target pairs exist. If Eve changed her strategy to go towards 
s 2 (instead of choosing t±) then Adam has a strategy to reach £ 2 and achieve a payoff of 2. Thus, her 
regret would be equal to 1 . 

Second, we claim that any other strategy of Eve has a regret greater than or equal to 1. Indeed, if 
Eve decides to go towards s 2 (instead of choosing to go to t\) then Adam can choose to loop on the state 
before s 2 and the payoff in this case is 0. Hence, the regret of Eve is at least 1 . 

Note that minimal changes are required for the same construction to imply the result for Inf. Further, 
the weight function and threshold r can be accommodated so that Eve wins for the non-strict regret 
threshold. Hence, the general result follows. □ 

Memory requirements for Eve. It follows from our algorithms for computing regret in this variant 
that Eve only requires strategies with exponential memory. Examples where exponential memory is 
necessary can be easily constructed. 

Corollary 2. For all payoff functions Sup, Inf, LimSup. Limlnf, MP and MP, for all game graphs G, 
there exists m which is 2 0 (I G I) such that: 

Re Se 3 ,E 1 ( G ) = Re Ss|*,E 1 ( G )- 

5 Variant III: Adam plays word strategies 

For this variant, we provide tight upper and lower bounds for all the payoff functions: the regret threshold 
problem is EXPTIME-complete for Sup, Inf, LimSup, and Limlnf, and undecidable for MP and MP. For 
the later case, the decidability can be recovered when we fix a priori the size of the memory that Eve 
can use to play, the decision problem is then NP-complete. Finally, we show that our notion of regret 
minimization for word strategies generalizes the notion of good for games introduced by Henzinger and 
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Piterman in 1HP06] , and we also formalize the relation that exists with the notion of determinization by 
pruning for weighted automata introduced by Aminof et al. in [AKL10] . 

Additional definitions. We say that a strategy of Adam is a word strategy if his strategy can be 
expressed as a function r : N —X [ma x{deg + (v) \v G V}], where [n] = {i | 1 < i < n} and deg + (v) is the 
outdegree of v (i.e. the number of edges leaving v). Intuitively, we consider an order on the successors 
of each Adam vertex. On every turn, the strategy r of Adam will tell him to move to the i-th successor 
(or to a sink state, if its outdegree is less than i) of thevertex according to the fixed order. We denote 
by 2U V the set of all such strategies for Adam. When considering word strategies, it is more natural to 
see the arena as a (weighted) automaton. 

A weighted automaton is a tuple T = (Q, qi, A, A, w) where A is a finite alphabet, Q is a finite set 
of states, qi is the initial state, ACQxdxQis the transition relation, w : A —x Z assigns weights 
to transitions. A run of T on a word aoai... G A u is a sequence p = q^a^qiai ... G (Q x A)“ such 
that (qi, at, qi+i) G A, for all i > 0, and has value Val(p) determined by the sequence of weights of the 
transitions of the run and the payoff function. The value T assigns to a word is the supremum of the 
values of all its runs on the word. We say the automaton is deterministic if A is functional. 

A game in which Adam plays word strategies can be reformulated as a game played on a weighted 
automaton T = ( Q , qj , A, A, w) and strategies of Adam - of the form r : N —X A - determine a sequence 
of input symbols to which Eve has to react by choosing A-successor states starting from qj. In this 
setting a strategy of Eve which minimizes regret defines a run by resolving the non-determinism of A in 
r, and ensures the difference of value given by the constructed run is minimal w.r.t. the value of the best 
run on the word spelled out by Adam. For instance, if all vertices in Fig. |T]are replaced by states, Eve 
can choose the successor of v± regardless of what letter Adam plays and from V 2 and V 3 Adam chooses 
the successor by choosing to play a or b. Furthermore, his choice of letter tells Eve what would have 
happened had the play been at the other state. 

The following result summarizes the results of this section: 

Theorem 3. Deciding if the regret value is less than a given threshold (strictly or non-strictly) playing 
against word strategies of Adam is EXPTIME -complete for Inf , Sup, Limlnf, and LimSup; it is undecidable 
for MP and MP. 

Upper bounds. There is an EXPTIME algorithm for solving the regret threshold problem for Inf, Sup, 
Limlnf, and LimSup. This algorithm is obtained by a reduction to parity and Streett games. 

Lemma 12. For r G Q, weighted automaton T and payoff function Inf, Sup. Limlnf, or LimSup, deter¬ 
mining whether Reg e3i2Dv (r) < r, for <\ G {<, <}, can be done in exponential time. 

We show how to decide the strict regret threshold problem. However, the same algorithm can be 
adapted for the non-strict version by changing strictness of the inequalities used to define the par- 
ity/Streett accepting conditions. 

Proof. We focus on the Limlnf and LimSup payoff functions. The result for Inf and Sup follows from the 
translation to Limlnf and LimSup games given in Sect. [3] Our decision algorithm consists in first building 
a deterministic automaton for T = (Qi, qi, A, Ai, w\) using the construction provided in [CDHIOj . We 
denote by Dr = (Q 2 , Si, A, A 2 , wf) this deterministic automaton and we know that it is at most expo¬ 
nentially larger than T. Next, we consider a simulation game played by Eve and Adam on the automata 
T and Dr- The game is played for an infinite number of rounds and builds runs in the two automata, 
it starts with the two automata in their respective initial states ( qi , sj), and if the current states are q\ 
and q 2 , then the next round is played as follows: 

• Adam chooses a letter a G A, and the state of the deterministic automaton is updated accordingly, 
i.e. q ' 2 = A 2 (q 2 ,a), then 

• Eve updates the state of the non-deterministic automaton to q( by reading a using one of the edges 
labelled with a in the current state, i.e. she chooses q( such that q( G Ai(gi, a). The new state of 
the game is (q(, q' 2 ). 
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Eve wins the simulation game if the minimal weight seen infinitely often in the run of the non-deterministic 
automaton is larger than or equal to the minimal weight seen infinitely often in the deterministic au¬ 
tomaton minus r. It should be clear that this happens exactly when Eve has a regret bounded by r in 
the original regret game on the word which is spelled out by Adam. 

Let us focus on the lim inf payoff function now. We will sketch how this game can be translated into 
a parity game. For completeness, we now provide a formal definition of the latter. A parity game is a 
pair (G, fl) where G is a non-weighted arena and f! : V —»• N is a function that assigns a priority to each 
vertex. Plays, strategies, and other notions are defined as with games played on weighted arenas. A play 
in a parity game induces an infinite sequence of priorities. We say a play is winning for Eve if and only 
if the minimal priority seen infinitely often is odd. The parity index of a parity game is the number of 
priorities labelling its vertices, that is |{SA('u) | v £ V}\. 

To obtain the translation, we keep the structure of the game as above but we assign priorities to the 
edges of the games instead of weights. We do it in the following way. If X = {xi,X 2 , is the 

ordered set of weight values that appear in the automata (note that \X\ is bounded by the number of 
edges in the non-deterministic automaton), then we need the set of priorities D = {2,..., 2n + 1}. We 
assign priorities to edges in the game as follows: 

• when Adam chooses a letter a from 52 , then if the weight that labels the edge that leaves <72 with 
letter a in the deterministic automaton is equal to Xi El, then the priority is set to 2 z -+- 1 , 

• when Eve updates the non-deterministic automaton from <71 with a edge labelled with weight w, 
then the color is set to 2 i where i is the index in X such that 1 < w + r < Xi. 

It should be clear then along a run, the minimal color seen infinitely often is odd if and only if the 
corresponding run is winning for Eve in the simulation game. So, now it remains to solve a parity game 
with exponentially many states and polynomially many priorities w.r.t. the size of T. This can be done 
in exponential time with classical algorithms for parity games. 

LimSup to Streett games. Let us now focus on LimSup. In this case we will reduce our problem to 
that of determining the winner of a Streett game with state-space exponential w.r.t. the original game 
but with number of Streett pairs polynomial (w.r.t. the original game). Recall that a Streett game is a 
pair (G, F) where G is a game graph (with no weight function) and F C V(V) x V(V) is a set of Streett 
pairs. We say a play is winning for Eve if and only if for all pairs (E, F ) £ F, if a vertex in E is visited 
infinitely often then some vertex in F is visited infinitely often as well. 

Consider a LimSup automaton T = (Q, qi, A, A, w). For Xi £ {w(d) \ d £ A} let us denote by A- Xi 
the Brichi automaton with Biichi transition set equivalent to all transitions with weight of at least Xi. 
We denote by V- Xi = ( Qi , qij, A , Si, fij) the deterministic parity automaton with the same language as 
From |Pit07j we have that V- Xi has at most 2|Q|^|(3|! states and parity index 2\Q\ (the number 
of priorities). Now, let xi < X 2 <•••<£; be the weights appearing in transitions of T. We construct 
the (non-weighted) arena Gr = (V) Vg, E, vj) and Streett pair set F as follows 

• V = Q x nL Qi u Q X nL Qi* All Q x nil Qi X A x Q; 

• Va = Q x nL Qi x A ; 

• E contains 


- ((p,Pi, ■ ■ ■ ,Pl)),(p,Pi, ■ ■ ■ ,Pl,a)) for all a £ A, 

- ((p,pi,...,pi,a),(p,p 1 ,...,pi,a,q)) if (p,a,q) £ A, 

- ((jp,pi,...,pi,a,q),(q,qi,...,qi)) if for all 1 < i < l: ( Pi,a,qi ) £ <5*; 

• For all 1 < i < l and all even y such that Range(r^) 9 y, T contains the pair (Ei, Fi) where 
~ E i,y = I tti(Pi,a,6(pi,a)) = y}, and 

- Fi, y = {(P , • • • ,Pj, ■ ■ ■ ,Pi,a,q) | (f h(Pi,a,S(pi,a)) < y A y (mod 2) = 1) V w(p,a,q) >Xi~ r}. 
3 Since Si is deterministic, we sometimes write 5i(p, a) to denote the unique q E Qi such that (p, a, q) E Si. 
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Figure 9: Initial gadget used in reduction from countdown games. 


It is not hard to show that in the resulting Streett game, a strategy a of Eve is winning against any 
strategy r of Adam if and only if for every automaton T>~ Xi which accepts the word induced by r then 
the run of T induced by er has payoff of at least a\ — r, if and only if Eve has a winning strategy in T to 
ensure regret is less than r. 

Note that the number of Streett pairs in G r is polynomial w.r.t. the size of T, i.e. 

i 

i-h<£ |Range(fli)| 

i=o 

< l • 2|Q| 

< |Q| 2 • 2|Q| = 2|Q| 3 . 

From [PP06] we have that Streett games can be solved in time 0 (nm k+ 1 kk\) where n is the number of 
states, m the number of transitions and k the number of pairs in T. Thus, in this case we have that Gr 
can be solved in 

0((2|Qpl|Q|!) 3+2 ^l 3 • 2|Q| 3 • (2|Q| 3 )!). 

which is still exponential time w.r.t. the size of T. □ 

Lower bounds. We first establish EXPTIM E-hardness for the payoff functions Inf, Sup, Limlnf, and 
LimSup by giving a reduction from countdown games |.TSL08j , That is, we show that given a countdown 
game, we can construct a game where Eve ensures regret less than 2 if and only if Counter wins in the 
original countdown game. 

Lemma 13 . For r £ Q, weighted automaton T and payoff function Inf , Sup. Limlnf, or LimSup, deter¬ 
mining whether Reg S3j2r(v (r) <\ r, for < £ {<, <}, is EXPTIM E-hard. 

Let us first formalize what a countdown game is. A countdown game C consists of a weighted graph 
(S', T), where S is the set of states and T C S x (N \ {0}) x S is the transition relation, and a target 
value N £ N. If t = (s, d, s') £ T then we say that the duration of the transition t is d. A configuration 
of a countdown game is a pair (s,c), where s £ S is a state and c £ N. A move of a countdown game 
from a configuration (s,c) consists in player Counter choosing a duration d such that (s,d, s') £ T for 
some s' £ S followed by player Spoiler choosing s" such that (s, d, s") £ T, the new configuration is then 
(s", c + d). Counter wins if the game reaches a configuration of the form (s, N) and Spoiler wins if the 
game reaches a configuration (s, c) such that c < N and for all t = (s, d, •) £ T we have that c + d > N. 

Deciding the winner in a countdown game C from a configuration (s, 0) - where N and all durations 
in C are given in binary - is EXPTIME-complete. 

of Lemma\J 3 Let us fix a countdown game C = ((S', T), N) and let n = |_log 2 N\ + 2. 

Simplifying assumptions. Clearly, if Spoiler has a winning strategy and the game continues beyond 
his winning the game, then eventually a configuration (s,c), such that c > 2 ra , is reached. Thus, we 
can assume w.l.o.g. that plays in C which visit a configuration (s, N) are winning for Counter and plays 
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Figure 10: Counter gadget. 



Figure 11: Adder gadget: depicted +9. 
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which don’t visit a configuration (s,N) but eventually get to a configuration (s',c) such that c > 2 n are 
winning for Spoiler. 

Additionally, we can also assume that T in C is total. That is to say, for all s G S there is some 
duration d such that (s, d, s') G T for some s’ G S. If this were not the case then for every s with no 
outgoing transitions we could add a transition (s, N + 1, s_l) where s_l is a newly added state. It is easy 
to see that either player has a winning strategy in this new game if and only if he has a winning strategy 
in the original game. 

Reduction. We will now construct a weighted arena T with W = 2 such that, in a regret game with 
payoff function Sup played on T, Eve can ensure regret value strictly less than 2 if and only if Counter 
has a winning strategy in C. 

As all weights are 0 in the arena we build, with the exception of self-loops on sinks, the result holds 
for Sup, LimSup and Inf. We describe the changes required for the inf result at the end. 

Implementation. The alphabet of the weighted arena T = ( Q,qj,A,A,w) is A = {bi | 0 < i < 
n} U {ci | 0 < i < n} U {bail, choose} U S. We now describe the structure of T (i.e. Q, A and w). 

Initial gadget. Figure [9] depicts the initial state of the arena. Here, Eve has the choice of playing 
left or right. If she plays to the left then Adam can play bail and force her to To while the alternative 
play resulting from her having chosen to go right goes to T 2 . Hence, playing left already gives Adam a 
winning strategy to ensure regret 2, so she plays to the right. If Adam now plays bail then Eve can go 
to T 2 and as W = 2 this implies the regret will be 0. Therefore, Adam plays anything but bail. 

Counter gadget. Figure flTIl shows the left sub-arena. All states from {xi\ 0 < i < n} have incoming 
transitions from the left part of the initial gadget with symbol A \ {bail} and weight 0. Let j/o ■ ■ • Dn G B 
be the (little-endian) binary representation of N, then for all Xi such that yi = 1 there is a transition 
from Xi to To with weight 0 and symbol bail. Similarly, for all xi such that yi = 0 there is a transition 
from xi to To with weight 0 and symbol bail. All the remaining transitions not shown in the figure cycle 
on the same state, e.g. Xi goes to Xi with symbol choose and weight 0 . 

The sub-arena we have just described corresponds to a counter gadget (little-endian encoding) which 
keeps track of the sum of the durations “spelled” by Adam. At any point in time, the states of this sub¬ 
arena in which Eve believes alternative plays are now will represent the binary encoding of the current 
sum of durations. Indeed, the initial gadget makes sure Eve plays into the right sub-arena and therefore 
she knows there are alternative play prefixes that could be at any of the xi states. This corresponds to 
the 0 value of the initial configuration. 

Adder gadget. Let us now focus on the right sub-arena in which Eve finds herself at the moment. 
The right transition with symbol A \ {bail} from the initial gadget goes to state s - the initial state from 
C. It is easy to see how we can simulate Counter’s choice of duration and Spoiler’s choice of successor. 
From s there are transitions to every (s,c), such that (s,c, s') G T for some s' G S in C, with symbol 
choose and weight 0. Transitions with all other symbols and weight 0 going to Ti - a sink with a 
1-weight cycle with every symbol from s ensure Adam plays choose, lest since W = 2 the regret of the 
game will be at most 1 and Eve wins. 

Figure fill shows how Eve forces Adam to “spell” the duration c of a transition of C from (s, c). For 
concreteness, assume that Eve has chosen duration 9. The top source in Figure fill is therefore the state 
(s, 9). Again, transitions with all the symbols not depicted go to Ti with weight 0 are added for all states 
except for the bottom sink. Hence, Adam will play 60 and Eve has the choice of going straight down or 
moving to a state where Adam is forced to play ci. Recall from the description of the counter gadget 
that the belief of Eve encodes the binary representation of the current sum of delays. If she believes a 
play is in x\ (and therefore none in x[) then after Adam plays b 0 it is important for her to make him 
play Ci or this alternative play will end up in T 2 . It will be clear from the construction that Adam 
always has a strategy to keep the play in the right sub-arena without reaching Ti and therefore if any 
alternative play from the left sub-arena is able to reach T 2 then Adam wins (i.e. can ensure regret 2). 
Thus, Eve decides to force Adam to play ci- As the duration was 9 this gadget now forces Adam to play 
&4 and again presents the choice of forcing Adam to play C 5 to Eve. Clearly this can be generalized for 
any duration. This gadget in fact simulates a cascade configuration of n 1-bit adders. 

Finally, from the bottom sink in the adder gadget, we have transitions with symbols from S with 
weight 0 to the corresponding states (thus simulating Spoiler’s choice of successor state). Additionally, 
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with any symbol from S and with weight 0 Eve can also choose to go to a state qbaii where Adam is 
forced to play bail and Eve is forced into _Lo. 

Argument. Note that if the simulation of the counter has been faithful and the belief of Eve encodes 
the value N then by playing bail , Adam forces all of the alternative plays in the left sub-arena into the 
J_o sink. Hence, if Counter has a winning strategy and Eve faithfully simulates the C she can force this 
outcome of all plays going to _l_o- Note that from the right sub-arena we have that ±2 is not reachable 
and therefore the highest payoff achievable was 1. Therefore, her regret is of at most 1. 

Conversely, if both players faithfully simulate C and the configuration N is never reached, i.e. Spoiler 
had a winning strategy in C then eventually some alternative play in the left sub-arena will reach x n and 
from there it will go to J_ 2 - Again, the construction makes sure that Adam always has a strategy to keep 
the play in the right sub-arena from reaching _Li and therefore this outcome yields a regret of 2 for Eve. 

Changes for Inf. For the same reduction to work for the Inf payoff function we add an additional 
symbol kick to the alphabet of E. We also add deterministic transitions with kick, from all states which 
are not sinks Ij, for some x, to J_o- Finally, all non-loop transitions in the initial gadget are now given a 
weight of 2 ; the ones in the counter gadget are given a weight of 2 as well; the ones in the adder gadget 
(i.e. right sub-arena) are given a weight of 1 . 

We observe that if Counter has a winning strategy in the original game C then Eve still has a winning 
strategy in T. The additional symbol kick allows Adam to force Eve into a 0-loop but also ensures that 
all alternative plays also go to To, thus playing kick is not beneficial to Adam unless an alternative play 
is already at T 2 . Conversely, if Spoiler has a winning strategy in C then Adam has a strategy to allow 
an alternative play to reach T 2 while Eve remains in the adder gadget. He can then play kick to ensure 
the payoff of Eve is 0 and achieve a maximal regret of 2. 

Once again, we observe that the above reduction can be readily parameterized. That is, we can 
replace the 2 value, the 1 value and the 0 value from the T 2 , Ti, To sink loops by arbitrary values A, B , 
C satisfying the following constraints: 

• A> B > C, 

• A — C > r so that Eve loses by going left in the initial gadget, 

• A — B < r so that she does not lose by faithfully simulating the adder if she has a winning strategy 
from the countdown game, or in other words: if Adam cheats then A — B is low enough to punish 
him, 

• B — C < r so that she does not regret having faithfully simulated addition, that is, if she plays her 
winning strategy from the countdown game then she does not consider B — C too high and regret 
it. 

Changing the strictness of the last three constraints and finding a suitable valuation for r and A,B,C 
suffices for the reduction to work for the non-strict regret threshold problem. Such a valuation is given 
by A = 2, B = 1, C = 0 with r = 1. □ 

To show undecidability of the problem for the mean-payoff function we give a reduction from the 
threshold problem in mean-payoff games with partial-observation. This problem was shown to be unde- 
cidable in lDDC+1 OllHPRI 41 . 

Lemma 14. For r € Q, weighted automaton T and payoff function MP or MP, determining whether 
Reg Sg 2jj v (r) ^ r > f or ^ ^ {<,<}, undecidable even if Eve is only allowed to play finite memory 
strategies. 

A mean-payoff game with partial-observation (MPGPO for short) G is a tuple ( Q,qj,A,A,w,Obs ) 
where Q is a set of states, qi is the initial state of the game, A is a finite set of actions, A C QxAxQis the 
transition relation, w : A —> Q is a weight function and Obs C V{Q) is a partition of Q into observations. 
In these games a play is started by placing a token on qi, Eve then chooses an action from A and Adam 
resolves non-determinism by choosing a valid successor (w.r.t. A). Additionally, Eve does not know 
which state Adam has chosen as the successor, she is only revealed the observation containing the state. 


22 




More formally: a concrete play in such a game is a sequence qoaoqiai... £ (Q x A)“ such that qo = qi 
and (qi, ai, qi+\) £ A, for all i > 0. An abstract play is then a sequence = ooaooiai... £ ( Obs x A)“ 
such that there is some concrete play 7 r = qoaoqiai... and qi £ Oj, for all i > 0 ; in this case we say that 
7T is a concretization of if. Strategies of Eve in this game are of the form a : (Obs x A)*Obs —» A, that 
is to say they are observation-based. Strategies of Adam are not symmetrical, he is allowed to use the 
exact state information, i.e. his strategies are of the form r : (Q x A)* —»• Q. 

The threshold problem for mean-payoff games is defined as follows. Given v £ Q, determining whether 
Eve has an observation-based strategy such that, for all counter-strategies of Adam, the resulting abstract 
play has no concretization with mean-payoff value (strictly) less than v. For convenience, let us denote 
this problem by maxMPGPO(> v ) and by maxMPGPO(> v) when the inequality is strict and non- 
strict, respectively. Note that in this case Eve is playing to maximize the mean-payoff value of all concrete 
runs corresponding to the abstract play being played while Adam is minimizing the same. 

It was shown in [DDG + 10llHPR14] that both problems are undecidable for MP and for MP. That 
is, determining if maxMPGPO(> v ) or maxMPGPO(> v) is undecidable regardless of the definition 
used for the mean-payoff function. Further, if we ask for the existence of finite memory observation-based 
strategies of Eve only, both definitions ( MP and MP) coincide and the problem remains undecidable. 

Consider a given MPGPO H = ( Q, qi, A, A, w, Obs), and denote by H' the game obtained by multi¬ 
plying by —1 all values assigned by w to the transitions of H. Clearly, we get that the answer to whether 
maxMPGPO(> v ) (resp. maxMPGPO(> i/)) in H is affirmative if and only if in H' Eve has an 
observation-based strategy to ensure that against any strategy of Adam, the resulting abstract play is 
such that all concretizations have mean-payoff value of less than or equal to —v (resp. strictly less than 
—v). Denote these problems by minMPGPO(< /j) and minMPGPO(< p), respectively. It follows 
that for any definition of the mean-payoff function, these problems are undecidable (even if we are only 
interested in finite memory strategies of Eve). 

Simplifying assumptions. We assume, w.l.o.g., that in mean-payoff games with partial-observation 
the transition relation is total. As the weights in mean-payoff games with partial-observation can be 
shifted and scaled, we can assume w.l.o.g. that v is any integer N. Furthermore, we can also assume 
that the mean-payoff value of any concrete play in a game is bounded from below by 0 and from above 
by M (this can again be achieved by shifting and scaling). 

of Lemma m We give a reduction from the threshold problem of mean-payoff games with partial obser¬ 
vation lDDG + lOllHPR ,14] that resembles the reduction used for the proof of Lemma [HU More specifically, 
given a mean-payoff game with partial-observation H = ( S, si, T, B, c, Obs), we construct a weighted au¬ 
tomaton Th = (Q, qi, A, A, w) with the same payoff function such that 

if and only if the answer to minMPGPO(< N) is affirmative. The reduction we describe works for any 
R, N, M, C such that 

• C <R, 

• f - C < R, and 

• f < R, 

for concreteness we consider R = A, N = 4, M = 6 and C = 3. 

Let us describe how to construct the weighted arena r h from G. The alphabet of F# is A = 
B U {bail} U Obs. The structure of Th includes a gadget such as the one depicted in Figure [9] Recall 
from the proof of Lemma [TT] that this gadget ensures Eve chooses to go to the right sub-arena, lest Adam 
has a spoiling strategy. As the left sub-arena we have a modified version of H. First, for every state 
s £ S and every action b £ B, we add an intermediate state (s, b) such that when b is played from s the 
play deterministically goes to (s, b) and for any transition (s, b, s') in H we add a transition in Th from 
( s,b ) to s' with action o s >, where cv is the observation containing s'. Second, we add transitions from 
every s £ S to Lq for symbol bail with weight 0 and from every (s, b) to T c with symbol o if there is 
no s' £ o such that (s, b, s') £ T. The sink _l_c has, for every symbol a £ A, a weight C self-loop. As the 
right sub-arena we will have states qb for all b £ B. For any such qb there are transitions with weight 
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0 and symbol b to q 0 b s and transitions with weight 0 and symbols A \ { 6 } to _l_c. From q 0 b s with any 
symbol from Obs , there are 0-weight transitions to qb> (for any b' £ B) and transitions with weight 0 
and symbols A \ Obs to J_<?. All qb have incoming edges from the state of the initial gadget which leads 
to the right sub-arena. 

We claim that Eve has a strategy a in Fh to ensure regret less than R if and only if the answer to 
minMPGPO (< N) is affirmative. Assume that the latter is the case, i.e. in H Eve has an observation- 
based strategy to ensure that against any strategy of Adam the abstract play has no concretization with 
mean-payoff value greater than or equal to N. Let us describe the strategy of Eve in Th- First, she plays 
into the right sub-arena of the game. Once there, she tries to visit states q^qb! ■ ■ ■ based on her strategy 
for H. If Adam, at some q ^ does not play bi, or at some visit to q 0 b s he plays a non-observation symbol, 
then Eve goes to _L c- The play then has value C. Since no alternative play in the left sub-arena can 
have value greater than A- and we have that A — C < R, Eve wins. Thus, we can assume that Adam, at 
every q ^ plays the symbol bi and at every visit to q 0 bs plays an observation. Note that, by construction 
of the left sub-arena, we are forcing Adam to reveal a sequence of observations to Eve and allowing her 
to choose a next action. It follows that the value of the play in Th is 0. Any alternative play in the right 
sub-arena would have value of at most C as the highest weight in it is C. In the left sub-arena, we have 
that all alternative plays have value less than A Indeed, since she has followed her winning strategy 
from H , and since by construction we have that all alternative plays in the left sub-arena correspond to 
concretizations of the abstract path spelled by Adam and Eve, if there were some play with value of at 
least A this would contradict her strategy being optimal. As C < R and A < R 1 we have that Eve wins 
the regret game, i.e. her strategy ensures regret less than R. 

Conversely, assume that the answer to minMPGPO(< N) is negative. Then regardless, of which 
strategy from H Eve decides to follow, we know there will be some alternative play in the left sub-arena 
with value of at least A. if Adam allows Eve to play any such strategy then the value of the play is 0 
and her regret is at least A < jj, which concludes the proof for the strict regret threshold problem. 

We observe that the restriction on N, M , R and C can easily be adapted to allow for a reduction from 
minMPGPO(< N ) to the non-strict regret threshold problem. 

Finally, we note that in the above proof Eve might require infinite memory as it is known that in 
mean-payoff games with partial-observation the protagonist might require infinite memory to win. Yet, 
as we have already mentioned, even if we ask whether Eve has a winning finite memory observation-based 
strategy, the problem remains undecidable. Notice that the above construction when restricting Eve 
to play with finite memory - gives us a reduction from this exact problem. Hence, even when restricting 
Eve to use only finite memory, the problem is undecidable. □ 

Memory requirements for Eve and Adam. It is known that positional strategies suffice for Eve 
in parity games. On the other hand, for Streett games she might require exponential memory (see, 
e.g. |D.TW97] L This exponential blow-up, however, is only on the number of pairs—which we have 
already argued remains polynomial w.r.t. the original automaton. It follows that: 

Corollary 3. For payoff functions Sup. Inf, LimSup, Limlnf, for all weighted automata A, there exists 
m which is 2°^^ such that: 

-^ e S© 3 ,2n v (A) = Reg S m jOTv (A). 

Fixed memory for Eve. Since the problem is EXPTIME-hard for most payoff functions and already 
undecidable for MP and MP, we now fix the memory Eve can use. 

Theorem 4. For r £ Q, weighted automaton T and payoff function Inf, Sup, Limlnf. LimSup. MP , or 
MP. determining whether Reg S m j2Dv (r) <3 r, for <] £ {<, <}, can be done in NTIME(m 2 |r| 2 ). 

Denote by fRy C QUy the set of all word strategies of Adam which are regular. That is to say, w £ fRy 
if and only if w is ultimately periodic. It is well-known that the mean-payoff value of ultimately periodic 
plays in weighted arenas is the same for both MP and MP . 

Before proving the theorem we first show that ultimately periodic words suffice for Adam to spoil a 
finite memory strategy of Eve. Let us fix some useful notation. Given weighted automaton T and a finite 
memory strategy er for Eve in T we denote by T a the deterministic automaton embodied by a refinement 
of r that is induced by a. 
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Lemma 15. For r £ Q, weighted automaton T, and payoff function Inf, Sup, Limlnf, LimSup. MP , or 
MP. if Reg S m i2rtv (r) > r then Reg s ™ Wv (r) > r, for > £ {>,>}. 

Proof. For Inf, Sup, Limlnf, and LimSup the result follows from Lemma fT21 It is known that positional 
strategies suffice for either player to win a parity game. Thus, if Adam wins the parity game defined 
in the proof of Lemma [T3] then he has a positional strategy to do so. Now, for any strategy of Eve in 
the original game, one can translate the winning strategy of Adam in the parity game into a spoiling 
strategy of Adam in the regret game. This strategy will have finite memory and will thus correspond to 
an ultimately periodic word. Hence, it suffices for us to show the claim follows for mean-payoff. We do 
so for MP and > but the result for MP follows from minimal changes to the argument (a small quantifier 
swap in fact) and for > variations we need only use the strict versions of Equations (j*) and (fill) . We 
assume without loss of generality that all weights are non-negative. 

Let a be the best (regret minimizing) strategy of Eve in T which uses at most memory m. We claim 
that if Adam has a word strategy to ensure the regret of Eve in T is at least r then he also has a regular 
word strategy to do so. 

Consider the bi-weighted graph G constructed by taking the synchronous product of T and r CT while 
labelling every edge with two weights: the value assigned to the transition by the weight function of r CT 
and the value assigned to the transition by that of T. For a path tt in G, denote by Wi{ tt) the sum of the 
weights of the edges traversed by n w.r.t. the t-th weight function. Also, for an infinite path tt, denote 
by MP .- the mean-payoff value of 7r w.r.t. the i-th weight function. Clearly, Adam has a word strategy 
to ensure a regret of at least r against the strategy a of Eve if and only if there is an infinite path tt 
in G such that JMP 2 ( 7r ) ~ MP , (tt) > r. We claim that if this is the case then there is a simple cycle \ 
in G such that t^\W 2 {x) ~ ttW’^x) > r - The argument is based on the cycle decomposition of tt (see, 
e.g. IEM79I L 

Assume, for the sake of contradiction, that all the cycles x i n G satisfy the following: 


Ixl 


■w 2 {X) 


Ixl 


^iVAy — 


(*) 


and let us consider an arbitrary infinite path tt = vov± .... Let l = MP ] (tt). We will show 

V . nW2((Vj)j<k) 7 . 

lim inf-—- l <r — e, 

k—> oo k 


(ii) 


from which the required contradiction follows. 

For any k > 0, the cycle decomposition of ( Vj)j<k tells us that apart from a small sub-path, tt' , of 
length at most n (the number of states in G), the prefix (vj)j<k can be decomposed into simple cycles 
Xi,...,Xt such that Wi((vj)j<k) = Wifir') + Y^j=i w i(Xj) f° r * = 1,2. If W is the maximum weight 
occurring in G, then from Equation Q we have: 


VJ2((Vj)j<k) < 

< 

< 


t 

nW +^ 2 w ^(xj) 

3=1 

t t 

nW + (r — e) wi (Xj) 

3=1 3=1 

nW + k(r - e) + wi{{vj)j<k)- 


Now, it follows from the definition of the limit inferior that for any e' > 0 and any K > 0 there 
exists k > I\ such that w\ ((vj)j<k) < k(l + e'). Thus for any e' > 0 and K' > 0, there exists 
k > max{/v', nW/e'f such that 


w 2 {{vj)j<k) nW 
k - k ^ ^ 


e) + (l + s') < (l + r - e) + 2e'. 


Equation m then follows from the definition of limit inferior. 

The above implies that Adam can, by repeating x infinitely often, achieve a regret value of at least 
r against strategy a of Eve. As this can be done by him playing a regular word, the result follows. □ 


25 






We now proceed with the proof of the theorem. The argument is presented for mean-payoff ( MP ) 
but minimal changes are required for the other payoff functions. For simplicity, we use the non-strict 
threshold for the emptiness problems. However, the result from [CDHlOj is independent of this. Further, 
the exact same argument presented here works for both cases. Thus, if suffices to show the result follows 
for >. 

of Theorem [^} We will “guess” a strategy for Eve which uses memory at most m and verify (in polynomial 
time w.r.t. m and the size of T) that it ensures a regret value of strictly less than r. 

Let A be the mean-payoff ( MP ) automaton constructed as the synchronous product of T and r CT . 
The new weight function maps a transition to the difference of the values of the weight functions of the 
two original automata. We claim that the language of A is empty (for accepting threshold > r) if and 
only if reg^m 2U V (^) < r - Indeed, there is a bijective map from every run of A to a pair of plays 7r,7r' 
in r such that both 7 r and n' are consistent with the same word strategy of Adam and 7r is consistent 
with a. It will be clear that A has size at most m|T|. As emptiness of a weighted automaton A can be 
decided in 0(|A| 2 ) time 1 C 101110 1, the result will follow. 

We now show that if the language of A is not empty then Adam can ensure a regret value of at least 
r against a in T and that, conversely, if Adam has a spoiling strategy against a in T then that implies 
the language of A is not empty. 

Let p x be a run of A on x. From the definition of A we get that MP (g T ) = lim inf^oo A \=o{ a i ~ &?) 
where a x = (a,i)i >0 and (3 X = (bi)i >0 are the infinite sequences of weights assigned to the transitions 
of p by the weight functions of T and T a respectively. It is known that if a mean-payoff automaton 
accepts a word y then it must accept an ultimately periodic word y\ thus we can assume that x is 
ultimately periodic (see, e.g. [CDHlOj ). Furthermore, we can also assume the run of the automaton on 
x is ultimately periodic. Recall that for ultimately periodic runs we have that MP (q t ) = MP(p x ). We 
get the following 


MP (/ 0 t) = lim sup - y^(ctj — bj) 


7=0 


1 i i 

< lim sup — a,j + lim sup —r- bj 

i —>OO ^ i—> OO 2 

7=0 7=0 

1 i 1 i 

< lim sup — cij — lim inf — bj 


sub-additivity of lim sup 


7=0 


7=0 


1 . 1 

< lim inf — N a,- — lim inf — N 

2—>00 i t—* 2—>00 i 

7=0 7=0 


ultimate periodicity. 


Thus, as x and p x can be be mapped to a strategy of Adam in T which ensures regret of at least r against 
<7, the claim follows. 

For the other direction, assume Adam has a word strategy r in T which ensures a regret of at least 
r against a. From Lemma 1151 it follows that r and the run p of T with value F(t) can be assumed to 
be ultimately periodic w.l.o.g.. Denote by p a and w a the run of r CT on r and the weight function of r CT 
respectively. We then get that 

lim inf —w a {p (J ) — lim inf —w(p) 

i—yoo % i—> 00 % 


= lim inf —w a {p a ) + lim sup -^w(p) 

i—^oo Z 2—^00 ^ 

= lim inf —w cr (p cr ) + lim inf — w{p) 

i—too Z i —>00 Z 

< MR(Vv) 


ultimate periodicity 
super-additivity of lim inf, 


where is the corresponding run of A for r and p. Hence, A has at least one word in its language. □ 
We provide a matching lower bound. The proof is an adaptation of the N P-hardness proof from [AKLlOj . 


26 










1 




Figure 12: Clause choosing gadget for the SAT reduction. There are as many paths from top to bottom 
(_l_i) as there are clauses (n). 


Theorem 5. For r £ Q, weighted automaton T and payoff function Inf, Sup, Limlnf , LimSup. MP , or 
MP. determining whether Reg s i i ajj v (T) <\ r, for <\ £ {<, <}, is NP -hard. 


Proof. We give a reduction from the SAT problem, i.e. satisfiability of a CNF formula. The construction 
presented is based on a proof in (AKLIOj . The idea is simple: given boolean formula $ in CNF we 
construct a weighted automaton r$ such that Eve can ensure regret value of 0 with a positional strategy 
in r$ if and only if <f> is satisfiable. 

Let us now fix a boolean formula >I> in CNF with n clauses and m boolean variables Xi,..., x m . The 
weighted automaton F$ = (Q , qj, A, A, w ) has alphabet A = {bail, ff\ U {i | 1 < i < n}. includes an 
initial gadget such as the one depicted in Figure [H Recall that this gadget forces Eve to play into the 
right sub-arena. As the left sub-arena of we attach the gadget depicted in Figure [T21 All transitions 
shown have weight 1 and all missing transitions in order for F$ to be complete lead to a state To with a 
self-loop on every symbol from A with weight 0. Intuitively, as Eve must go to the right sub-arena then 
all alternative plays in the left sub-arena correspond to either Adam choosing a clause i and spelling 
iffi to reach li or reaching To by playing any other sequence of symbols. The right sub-arena of the 
automaton is as shown in Figure fTHl where all transitions shown have weight 1 and all missing transitions 
go to To again. Here, from go we have transitions to state Xj with symbol i if the i-th clause contains 
variable Xj. For every state Xj we have transitions to jtrue and j false with symbol ff. The idea is to 
allow Eve to choose the truth value of Xj. Finally, every state jtrue (or j false) has a transition to Ti 
with symbol i if the literal Xj (resp. -i Xj ) appears in the Tth clause. 

The argument to show that Eve can ensure regret of 0 if and only if >I> is satisfiable is straightforward. 
Assume the formula is indeed satisfiable. Assume, also, that Adam chooses 1 < i < n and spells iffi. 
Since $ is satisfiable there is a choice of values for x \,..., x m such that for each clause there must be at 
least one literal in the i-th clause which makes the clause true. Eve transitions, in the right sub-arena 
from go to the corresponding value and when Adam plays ff she chooses the correct truth value for the 
variable. Thus, the play reaches Ti and, as W = 1 in <I> it follows that her regret is 0. If Adam does 
not play as assumed then we know all plays in reach Tq and again her regret is 0. Note that this 
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A, 1 


Figure 13: Value choosing gadget for the SAT reduction. Depicted is the configuration for (x\ V x 2 ) A 
{-'Xi V x 2 ) A {-'Xx V - 12 : 2 ). 
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strategy can be realized with a positional strategy by assigning to each Xj the choice of truth value and 
choosing from qo any valid transition for all 1 < i < n. 

Conversely, if $ is not satisfiable then for every valuation of variables xi,..., x m there is at least one 
clause which is not true. Given any positional strategy of Eve in we can extract the corresponding 
valuation of the boolean variables. Now Adam chooses 1 < i < n such that the i-th clause is not satisfied 
by the assignment. The play will therefore end in _Lo while an alternative play in the left sub-arena will 
reach T-[. Hence the regret of Eve in the game is 1. 

To complete the proof we note that the above analysis is the same for payoff functions Inf, LimInf, 
LimSup, and MP. For Sup it suffices to change all the weights in the gadgets from 1 to 0. 

We observe that, once more, we can adapt the values of the loops in the sinks _Li and To to get the 
same result for the non-strict regret threshold problem. □ 

Relation to other works. Let us first extend the definitions of approximation, embodiment and refine¬ 
ment from [AKLIOj to the setting of w-words. Consider two weighted automata A = (Qa, 9/, A, A a, wa ) 
and B = ( Qb , qi, A, Ag, wb) and let d : R x R — > R. be a metric O We say B (strictly) a-approximates A 
(with respect to d) if d(J3(w),A(w)) < a (resp. d(B(w ), A(w)) < a) for all words w £ AT We say B em¬ 
bodies A if Qa Q Qbi C Ag and wa agrees with wb on A . 4 . For an automaton A = ( Q , qi, A, A, w) 
and an integer k > 0, the /c-refinement of A is the automaton obtained by refining the state space of A 
using k boolean variables. Intuitively, this corresponds to having 2 k copies of every state, with each copy 
of p transitioning to all copies of q with a if ( p , a, q) £ A.The automaton A is said to be (strictly) ( a , k)- 
determinizable by pruning (DBP, for short) if the fc-refinement of A embodies a deterministic automaton 
which (strictly) a-approximates A. The next result follows directly from the above definitions. 

Proposition 2. For a £ Q, k £ N, a weighted automaton T is (strictly) (a, k)-DBP (w.r.t. the difference 
metric) if and only ifReg^k ^(F) < a (resp. Reg s2 fc ^(F) < a). 

In [HP06| the authors define good for games automata. Their definition is based on a game which 
is played on an w-automaton by Spoiler and Simulator. We propose the following generalization of the 
notion of good for games automata for weighted automata. A weighted automaton A is (strictly) a-good 
for games if Simulator, against any word w £ A u spelled by Spoiler, can resolve non-determinism in A 
so that the resulting run has value v and d(v,A(w)) < a (resp. d(u,A(w)) < a), for some metric d. We 
summarize the relationship that follows from the definition in the following result: 

Proposition 3. Fora £ Q , a weighted automaton F is (strictly) a-good for games (w.r.t. the difference 
metric) if and only if Reg eg 2! j v (r) < a (resp. Reg S3 j2l;v (r) < a). 

6 Discussion 

In this work we have considered the regret threshold problem in quantitative games. We have studied 
three variants which corresponds to different assumptions regarding the behavior of Adam. Our definition 
of regret is based on the difference measure: Eve attempts to minimize the difference between the value 
she obtains by playing the game, and the value she could have obtained if she had known the strategy 
of Adam in advance. In |AKL10| the ratio measure was used instead. We believe some of the results 
obtained presently can be extended to arbitrary metrics (as in, e.g., |BH14j l. In particular, all hardness 
statements should hold. We give more precise claims for the ratio measure below. 

For Inf, Sup, Limlnf, and LimSup. We have already observed that upper bounds for the regret thresh¬ 
old problem follow directly from our results if regret is defined using ratio (see Remark[l]). Furthermore, 
all hardness results presented here can also be adapted to obtain the same result for ratio. Indeed, the 
same constructions and gadgets can be used. These, together with correctly chosen regret threshold value 
r and modified edge weights and inequalities (such as the ones given to prove, for instance, Lemma [S]) 
are sufficient to show the same results hold for regret defined with ratio. 

4 The metric used in IAKL10| is the ratio measure. 
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For MP. All hardness results also hold for regret defined with ratio. As with the other payoff functions, 
minimal modifications are needed for the proofs given in this work to imply the result for the alternative 
definition of regret. Regarding the algorithms, we have solved the regret threshold problem for the first 
two variants. In the third variant, we have considered a restricted version of the game (Theorem U) 
and given an algorithm for it by reducing it to an the emptiness problem for mean-payoff automata. 
We claim the corresponding problems are in the same complexity classes, respectively, when regret is 
defined with ratio. For the first two, the proofs are almost identical to the ones we have give in the 
present work for the difference measure. For the third problem, Lemma [15] must be restated for ratio, 
yet the proof requires minimal modifications to work in that case. Finally, the argument used to prove 
Theorem U requires the reduction to mean-payoff automata be replaced by a reduction to ratio automata. 
However, all the properties from mean-payoff automata which were used in the proof, are also true for 
ratio automata ( e.g. ultimately periodic words be ing accep ted if an arbitrary word is accepted). The 
latter follow from results regarding ratio games in |BCG + 14l . 
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